Hierarchical Tandem Features for ASR in Mandarin
نویسندگان
چکیده
We apply multilayer perceptron (MLP) based hierarchical Tandem features to large vocabulary continuous speech recognition in Mandarin. Hierarchical Tandem features are estimated using a cascade of two MLP classifiers which are trained independently. The first classifier is trained on perceptual linear predictive coefficients with a 90 ms temporal context. The second classifier is trained using the phonetic class conditional probabilities estimated by the first MLP, but with a relatively longer temporal context of about 150 ms. Experiments on the Mandarin DARPA GALE eval06 dataset show significant reduction (about 7.6% relative) in character error rates by using hierarchical Tandem features over conventional Tandem features.
منابع مشابه
Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system
This paper aims at investigating the use of TANDEM features based on hierarchical processing of the modulation spectrum. The study is done in the framework of the GALE project for recognition of Mandarin Broadcast data. We describe the improvements obtained using the hierarchical processing and the addition of features like pitch and short-term critical band energy. Results are consistent with ...
متن کاملTonal articulatory feature for Mandarin and its application to conversational LVCSR
This paper presents our recent work on the development of a tonal Articulatory Feature (AF) for Mandarin and its application to conversational LVCSR. Motivated by the theory of Mandarin phonology, eight features for classifying the acoustic units and one feature for classifying the tone are investigated and constructed in the paper, and the AF-based tandem approach is used to improve speech rec...
متن کاملA comparative large scale study of MLP features for Mandarin ASR
MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigate the scalability of these features in terms of the amount of training data (from 100 hours to 16...
متن کاملIncorporating Pitch Features for Tone Modeling in Automatic Recognition of Mandarin Chinese
Tone plays a fundamental role in Mandarin Chinese, as it plays a lexical role in determining the meanings of words in spoken Mandarin. For example, these two sentences R R (I like horses) and R M (I like to scold) differ only in the tone carried by the last syllable. Thus, the inclusion of tone-related information through analysis of pitch data should improve the performance of automatic speech...
متن کاملIncorporating tone-related MLP posteriors in the feature representation for Mandarin ASR
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. In most state-of-the-art Mandarin automatic speech recognition systems, tonal acoustic units are used and F0 features are appended to the spectral features (MFCC/PLP). However, a tone depends on the F0 contour of a time span much longer than a frame. Ideally, systems would compute the framelevel likelihood of a tone u...
متن کامل